library(plotly) library(manhattanly) set.seed(81706)
library(plotly) library(manhattanly) set.seed(81706)
In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a "bell curve") even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions.
Source: Wikipedia
Another aspect of the Central Limit Theorem is that as you increase the number of samples taken from a non-normal distribution, the distribution of means tends towards normalcy. In this presentation, we're going to explore exactly that.
We're going to use a uniform distribution for our model, where n, the sample size, is 100. ns represents the number of samples taken.
Not linear at all. Let's see how taking multiple samples changes this.
Now let's take 10 samples on new uniform distributions and see what shape the means take.
Not a very linear quantile plot; the data isn't too normal.
Let's ramp up to ns = 30, a number that I had learned in my high school AP Statistics class.
Looks slightly more linear and normal, but I'm not satisfied.
Let's go a little higher.
This looks rather linear/normal, actually.
Let's double it.
That's a very normal set of data, except for the weird valley in the middle of the histogram.
If 100 fetches us good results, 500 should be nearly perfect.
Almost looks like the chart is a line.
Finally, let's see if 1000 samples give us a perfect distribution.
This is, essentially, as normal as a non-infinite dataset can probably get.